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HIGH SPEED VIDEO FRAME BUFFER 
PRIORITY 

This application is a division of Sen No. 09/129,293 filed 
Aug. 5, 1998, now U.S. Pat No. 6,278,645, which is a 
continuation and claims priority from U.S. patent applica- 
tion Ser. No. 08/832,708, filed Apr. 11, 1997 now U.S. Pat. 
No. 5,864,512, entitled "High-Speed Video Frame Buffer 
Using Single Port Memory Chips" and bearing, the disclo- 
sure of which is incorporated herein, in its entirety, by 
reference. 

FIELD OF THE INVENTION 

This invention relates to providing high-speed video 
graphics through use of single ported memory chips on the 
video card. 

BACKGROUND ART 

High performance graphics processing commonly 
requires a specialized graphics frame buffer including a 
graphics engine in communication with a host processor 
over a bus. Control over a graphics frame buffer of this sort 
has been achieved by a variety of means, typically involving 
hardware configured to supervise the operation of the graph- 
ics engine. Hie graphics engine is typically controlled 
through commands from a host computer's processor over a 
bus so as to provide request code and data from the host 
processor to the graphics engine. High-performance frame 
buffers in the prior art have three general characteristics. 

First, the video board logic for performing texture 
processing, Le. the integrated circuit that performs those 
functions, is separate from the circuitry for performing other 
frame buffer manipulations, such as graphics display 
requests. This results in limitations placed upon the perfor- 
mance of the graphics system due to the frame buffer 
designer's having to arrange for a communication path 
.between. the texture processor and other components on the 
board. 

Second, prior art video frame buffers arrange video 
memory in a linear fashion, such that consecutive memory 
locations represent the next pixel upon a given row of the 
display. In effect, prior art video memory arrangements track 
the scanline of the display. 

Third, prior art video frame buffers store as one word in 
memory all information relevant to a particular display 
pixel Consequently, acquiring the color value information 
for displaying a row of pixels upon the display requires 
skipping through video memory to obtain the values. This 
can be a very inefficient process. 

Prior art video frame buffers, exemplified by the Edge m 
graphics processing system sold by Intergraph Corporation, 
and described in a technical white paper titled GLZ5 Hard- 
ware User's Guide, which is incorporated herein by 
reference, represents the state of the prior art in graphics 
processing systems. However, the Edge HI, as do other prior 
art video buffers, suffers from the three general limitations 
referenced above: lack of integration, linear video buffer 
memory, and consecutive placement of pixel information 
within the frame buffer. These limitations result in a graphics 
processing system that is not as efficient or speedy as it could 
be. The present invention resolves these issues. 

SUMMARY OF THE INVENTION 

The present invention, in accordance with a preferred 
embodiment, provides a device for storing pixel information 
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for displaying a graphics image on a display. The informa- 
tion includes an intensity value and a value associated with 
each of a plurality of additional planes for each pixel. In this 
embodiment, the device has a video frame buffer memory 

5 having a series of consecutive addresses for storing infor- 
mation to be output to the display. The buffer memory is 
subdivided into a plurality of blocks, each block correspond- 
ing to a region of the display having a plurality of contiguous 
pixels. The device also has a processor for placing the pixel 

10 information within the frame buffer memory so that in a 
given block there are placed at a first collection of consecu- 
tive addresses the intensity values for each of the pixels in 
the block. (Typically the processor is implemented by one or 
more resorvers.) 

15 In a further embodiment, the frame buffer memory has a 
single port 

In a further embodiment, the placement of pixel informa- 
tion within the frame buffer includes a processor for placing 
20 at a second collection of consecutive addresses values for 
each of the pixels in the block associated with a first one of 
the plurality of additional planes. 

In another embodiment, the present invention provides a 
device for storing pixel information for displaying a graph- 

25 ics image on a display, the information including an intensity 
value and a value associated with each of a plurality of 
additional planes for each pixel. This embodiment has a 
video frame buffer for storing information to be output to the 
display, the buffer memory having a plurality of banks, each 

30 bank being separately addressable and being subdivided into 
a plurality of blocks, each block corresponding to a region 
of the display having a plurality of contiguous pixels. This 
embodiment also has a processor for placing the pixel 
information within the frame buffer so that pixel information 

35 relating to first and second contiguous blocks is stored in 
different ones of the plurality of banks. In a further 
embodiment, the buffer memory has two banks, a first bank 
. and.a second bank, and the pixel information relating to. first . 
and second contiguous blocks is stored in the first and 

40 second banks respectively, so that there results a checker- 
board form of allocation of pixels of the image over the 
display. In a further embodiment, the contiguous blocks are 
rectangular in shape, each block having more than 4 pixels 
on a side. In alternate embodiments, each block may have 

45 more than 7 pixels on a first side, and more than 7, 15, 31, 
63, or 79 pixels on a second side. 

In another embodiment, the invention provides a device 
for storing pixel information for displaying a graphics image 

^ on a display, the information including an intensity value 
and a value associated with each of a plurality of additional 
planes for each pixeL This embodiment has a video frame 

... buffer, memory having a series of consecutive addresses for . 
storing information to be output to the display, the buffer 

55 memory subdivided into a plurality of banks, each bank 
being separately addressable and subdivided into a plurality 
of blocks, each block corresponding to a region of the 
display having a plurality of contiguous pixels; and a 
processor for placing the pixel information within the frame 

w buffer so that, first, that pixel information relating to first and 
second contiguous blocks is stored in different ones of the 
plurality of banks, and second, in a given block there are 
placed at a first collection of consecutive addresses the 
intensity values for each of the pixels in the block. 

65 In a further embodiment, the buffer memory has two 
banks, a first bank and a second, and the pixel information 
relating to first and second contiguous blocks is stored in the 
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first and second banks respectively, so that there results a 
checkerboard form of allocation of pixels of the image over 
the display. 

Related methods are also provided. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The following drawings are intended to provide a better 
understanding of the present invention, but they are in no 
way intended to limit the scope of the invention. 

FIG. 1 is a diagram showing the general structure of a 
preferred embodiments of the graphics invention. 

FIG. 2, represented by FIGS. 2A, 2B, and 2C, shows a 
chart showing a comparison between packed versus full 
pixel information storage. 

FIG. 3, represented by FIGS. 3A, 3B, 3C, and 3D, shows 
a chart showing memory to display address mapping. 

FIG. 4a is an example of memory within a video frame 
buffer. 

FIG. 4b is a chart showing an example of checkerboard 
memory addressing. 

FIG. 5, represented by FIGS. 5A and SB, shows a chart 
showing a texture processing memory interface for 2Mx8 
SyncDRAMs. 

FIG. 6, represented by FIGS. 6A and 6B, shows a chart 
showing a texture processing memory interface for 1Mx16 
SyncDRAMs. 

FIG. 7, represented by FIGS. 7 A and 7B, shows a chart 
showing a texture processing memory interface for 256x16 
SyncDRAMs. 

FIG. 8, represented by FIGS. 8A and 8B, shows a chart 
showing a texel mapping for 2Mx8 SyncDRAMs. 

FIG. 9, represented by FIGS. 9A and 9B, shows a chart 
showing a texel mapping for 1Mx16 SyncDRAMs. 

FIG. 10, represented by FIGS. 10A and 10B, shows a 
chart showing a texel mapping for 256x16 SyncDRAMs. 

DETAILED DESCRIPTION OF A SPECIFIC 
EMBODIMENT 

A preferred embodiment of the present invention has been 
implemented in a graphics controller-processor having the 
general structure shown in FIG. 1. This embodiment is 
suitable for use with computers, such as those utilizing the 
Intel family of 80X86 processors (including the PENTIUM, 
PENTIUM Pro and MMX compliant technologies), running 
an operating system such as Microsoft Windows NT, 
designed to communicate over a Peripheral Component 
Interchange (PCI) Local Bus, pursuant to the PCI Local Bus 
Specification version 2.0 published by PCI Special Interest 
Group, 5200 NE Elam Young Parkway, HiDsboro, Oreg. 
97124-6497, which is hereby incorporated herein by refer- 
ence. However, the embodiment may also be configured, for 
example, to operate in an X-windows or other windowing 
environment, and on other buses, such as the VESA local 
bus (VLB), fibre channel and fibre optic buses. Note that 
with a sufficiently powerful central processing unit and 
sufficiently fast communication bus, for particularly com- 
plex graphics rendering, graphics processing may be off 
loaded to the central processing unit 

FIG. 1 shows a block diagram for a preferred implemen- 
tation of the invention. The principal components are the 
PCI DMA bridge chip 102 connecting the high-speed video 
RAM buffer 104 to the PCI bus 106, the graphics engine 
circuitry 108, a set of dual resolver chips 110, a RAM DAC 
chip 112, the texture buffer 114, and the frame buffer 116. 
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The basic flow of data within the high-speed video frame 
buffer system starts with a host computer's processor, which 
writes requests to the Request FIFO 118 inside the graphics 
engine 108 via a PCI address. The graphics engine interprets 

5 the request, breaks it down to pixel requests, and sends pixel 
requests over a dedicated bus 120 (IZ bus) to the appropriate 
Dual Resolver 110. (In a preferred embodiment, there may 
be several Dual Resolvers.) When a Resolver module 
receives a pixel request, it may alter the pixel's color, as well 

10 as determine whether the pixel should be written to the 
frame buffer. Independent of the rendering path, a Screen 
Refresh module 122 inside each Dual Resolver 110 requests 
data from the frame buffer 116 and sends the pixel's color 
data to the RAM DAC 112, which converts the digital color 

15 data to analog signals for display. 

The ScreenRefresh Module(SRM) 122 is responsible for 
supplying the video stream with pixel data. The video stream 
is scanline oriented: pixels are supplied starting at the left 
edge of the screen and painted from left to right. When the 

20 right edge of the screen is reached, the beam is reset to the 
left edge. This process continues for the entire screen. The 
memory organization in the invention is not inherently 
scanline oriented, but pixel block oriented (see discussion 
hereinbelow defining pixel blocking). For the 2 Mpixel case, 

25 each Resolver is only assigned 8 pixels per scanline within 
one pixel block. Pixel data includes Image, Image VLT 
Context, Overlay (or Highlight), and FastClear plane sets 
from the visible buffer. Some plane sets, such as FastClear, 
are stored 32 pixels per word. Therefore, when the memory 

30 controller reads FastClear, it reads enough data for the 8 
pixels (for 2 MP) on the current scanline, phis the next three 
scanlines. Image is stored 1 pixel per word. To reduce the 
bandwidth impact of supplying data to the Pixel FIFO, the 
SRM will read the dense plane sets on the first scanline and 

35 temporarily store the portion of the word that is not used for 
the current scanline. On the next scanlines, the data is, 
fetched from temporary storage (called Overrun RAMs) 
instead of the frame buffer.^What results, however, is that for 
the first and fifth scanlines within a pixel block, the memory 

40 controller must read at least one word for all of the plane sets 
that comprise a pixel's visible infiormation. On the remain- 
ing six scanlines of the pixel block, very few words (only 
Image for 102 PPP and Image and Overlay for 128 PPP) are 
required. In preferred embodiments, the first and fifth scan- 

45 lines as "Long" scanlines, and the remaining scanlines as 
"Short". 

Flags generated by the Pixel FIFO help the SRM deter- 
mine when to start and stop requesting more pixels from the 
Resorver*s memory controller. To generate the flags, the 

so FIFO compares the current depth of the FIFO with program- 
mable "water marks". If the current depth is lower than the 
low water mark (LWM), then the SRM begins requesting 

data. If the current depth is higher than* the high water mark 

(HWM), then the SRM quits requesting data. 

55 For long scanlines, the worst case latency is from when 
the low water mark (LWM) is reached to when memory 
actually begins to supply Image data. Also, the instantaneous 
fill rate is potentially very low on long scanlines. While the 
memory controller is filling the pixel FIFO, it cannot service 

60 any graphics requests in its IZ input FIFOs. Therefore, for 
long scanline cases, if the memory controller waits until the 
pixel FIFO is full before it services any IZ requests, then the 
IZ input FIFOs will fill, the IZ bus will stall, and system 
performance will be lost. For long scanlines, the require- 

65 merits on the water marks may be summarized as (1) set 
LWM high enough so the pixel FIFO won't go empty under 
the worst case latency conditions; and (2) set HWM low 
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enough to minimize the time the 1Z bus stalls. For short 
scanlines, the worst case latency is better than for long 
scanlines. Latency is shorter because there are fewer (or no) 
plane sets to read in front of the Image planes. Also, the 
instantaneous fill rate is very high, so it will take much less 
time to fill the pixel FIFO for short scanlines than for the 
long ones. These features imply that LWM may be set lower 
for short scanlines than for long scanlines, and that HWM 
should be set as high as possible with short scanlines to 
minimize the overhead of beginning and ending screen 
refresh cycles. Since the requirements on water marks 
conflict for short and long scanlines, the SRM uses two 
different sets: LWM1 and HWM1 when it is requesting 
pixels for "long" scanlines, and LWM2 and HWM2 when it 
is requesting pixels for "short" scanlines. In preferred 
embodiments, these values are programmable. 

If the Screen Refresh Manager requests the last visible 
pixel on the display, it will stop requesting data, even if it has 
not reached its HWM. This feature is present so that 
software has additional time during vertical blank to swap 
buffers before the SRM accesses the "visible" pixels for the 
upper left region of the screen. If this artificial stall is not 
introduced, then visual integrity could be degraded for that 
region for some frames. The SRM will begin requesting 
pixels for the Pixel FIFO after it receives a restart signal 
from the VSG approximately one half-line before vertical 
blank ends. Note that the Pixel FIFO wfll go completely 
empty once per frame. 

For storing the video display information, a preferred 
embodiment uses single-ported SDRAMs in the frame 
buffer and texture buffer. However, a preferred embodiment 
need not be limited to SDRAMS, and reference to SDRAMS 
is intended to encompass use of equivalent RAMs. In 
contrast, prior art video frame buffers stored their informa- 
tion in VRAM-type memory chips. These chips were dual- 
ported, meaning that the video board could read and write to 
video memory simultaneously, and resulted in parallel pro- 
cessing with fairly high performance video frame buffers. 
Until the present invention, video 'frame buffers using dual 
ported RAM represented the best the frame buffer industry 
could offer. As will be explained bereinbelow, using 
SDRAM type of memory, instead of VRAM memory, while 
raising the complexity associated with memory access, also 
greatly increases performance. 

In a preferred embodiment, a texture processor and a 
graphics engine are integrated into a single chip 124. By 
placing both into the same chip, it is possible to double the 
clock rate of the video card, as there are no external bus 
technologies to consider. An issue relevant to a single-chip 
design, however, is that memory accesses is more complex. 
In the present invention, the texture processor directly 
accesses the texture memory 114 via a dedicated bus 126. 
The graphics engine 108 does not have direct access to the 
frame buffer 116; instead the graphics engine 108 sends 
pixel commands to the resolve rs 110, whereupon the revolv- 
ers 110 directly access frame buffer memory 116. 

Communication between the graphics engine 108 over the 
specialized bus 120 is span -oriented. In the prior art Edge HI 
graphics processor, the Resorver does not know whether the 
original graphics request was for a triangle, vector, 
PutBIock, or BitBht, because the graphics engine breaks all 
of these operations into spans. The Resorver also does not 
know how many pixels are involved with most operations 
when it receives the XL headers (over the XL bus 120) and 
first piece of data for a request Although the Resorver 
receives little information concerning the type of request, it 
must react to the data as efficiently as possible. In a preferred 
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embodiment, the graphics engine 108 groups requests into 
three different categories: (1) Block requests (long-span, 
block-oriented requests such as PutBIock, GetBlock, 
RecFfll, FastRecFill, etc.), (2) Blit requests (blits consist of 

5 first reading a short subspan and then writing the subspan to 
a different part of the screen), and (3) Region requests 
(multiple spans with a high probability of pixel-block 
locality, such as triangles. Vectors are tumped into this 
request type). For Resolver Read, Write, and Fill requests, 

10 the IZP then sets two XL Request Type bits in the first XL 
header to indicate which category of request is being sent 
A preferred embodiment implements page crossing algo- 
rithms based on the request category identified by the 
graphics processor 108 in the request made to the commu- 

15 nication bus 120. The Resolvers 134, 136 optimize then- 
page crossings differently according to the data transfer 
category. Optimizing page crossings is important, since the 
FastClear cache is filled and flushed during page crossings. 
Indiscriminate page crossings, therefore, cost performance. 

2Q The two different page crossing modes are discussed below. 
Each mode corresponds to a specific request category. Note 
that SDRAMs have two banks. One bank may be accessed 
while the other bank is idle, being closed, precharging, or 
being opened. 

25 ModeO: Wait. Assume a Resolver is currently accessing a 
page (a "page" is synonymous with a pixel block) from 
BankO of the SDRAMs. When the Resolver stops accessing 
BankO and begins accessing a page from Bankl, close the 
page in BankO. The Resorver may then access Bankl while 

30 BankO is precharging. Wait until future activity specifically 
requires another pixel block in BankO before opening that 
pixel block. Model: Force: Do not close a page until a new 
page needs to be opened in an already-opened bank. As an 
example, assume a span in a PutBIock request will typically 

35 touch many pixel blocks horizontally (it only takes a span 
longer than 65 pixels in the 2 Mpixel FB to straddle three 
pixel blocks). When a page is closed, it will not be touched 
again until the next span. Therefore, the ModeO page cross- 

* irig algorithm is more appropriate than Model. 

40 For storing pixel data within memory, in a preferred 
embodiment, a complex method, referred herein as pixel- 
packing or packed-pixel format, is used to store graphics 
information. A frame buffer contains information about 
pixels, and a pixel is the basic unit within a graphics 

45 environment. A collection of pixels forms the screen of the 
display monitor used to show the graphics output. In the 
prior art, VRAM type memory chips are used to store the 
attributes that control the display of the pixels, and all data 
associated with a pixel is stored in the same word in VRAM 

50 memory. Consequently, if 124 bits were associated with 
each pixel, of which 24 were used for recording color 
intensity (i.e. 8 bits to encode red, green, and blue color 

- • • information), there would be 100 bit gaps in' the VRAM 
memory between occurrences of pixel coloration informa- 

55 tion. In an environment where getting access such informa- 
tion is the most important task, this spreading out of the 
information is not the most efficient arrangement for the 
pixel information. 

A preferred embodiment reduces the inefficiency by sub- 

60 dividing the display region into many logical rectangular 
pixel blocks, where each pixel block contains the pixels for 
that region upon the video display. For each logical pixel 
block, there is a corresponding region of video RAM. This 
region of RAM is broken into a display partition and a 

65 non-display partition. Unlike the prior art, preferred embodi- 
ments arrange the information for pixels within the pixel 
block so that the display intensity (e.g. color) values are 
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stored within the display partition, and the rest of the ous memory page. In preferred embodiments, a burst mode 
information for the pixels are arranged by plane category algorithm employed during rendering will allow grouping 
and stored in the non-display partition. For example, single- memory accesses together for pixels residing within a pair of 
bit planes, such as the fast-clear and select buffer planes, are pixel blocks, so long as the pixel blocks come from opposite 
packed into consecutive addresses in memory. Thus, on a 5 banks within the SDRAMs. In this fashion, during the 
machine with a data size of 32 bits, a single read will obtain drawing process, no extra clock cycles are wasted on open- 
clear information for 32 pixels, rather than 32 separate reads mg or pre charging memory locations, 
required in the prior methods of linear pixel-data arrange- To further increase performance, and as explained in more 
menLApreferr^ detail in fcrred embodiments, video 
horizontally-adjacent pixels at sequential memory 10 mtmojy ^ bjokm ^ ^ mcmory ba ^ a pixcl 

addresses. block, a vertical stripe of pixels is stored within one memory 

After packing the pixel data as described hereinabove, a bank Ea ch adjacent stripe of pixels is stored within a 

preferred embodiment is able to take advantage of SDRAM aff^x memory bank. When eight memory banks are 

burst mode. Burst mode means that a memory access ^ mcm ory bank stores every eighth stripe. By 

controller may be given just a starting memory address, and 15 pixels m ^ f^io^ [ D addition to having burst 

consecutive memory addresses may be-read without having mode to me fam& buffer, the invention may perform 

to specify the addresses for the consecutive memory loca- pixel operations m parallel. 

tions. Thus, it is not necessary, as with prior art methods , , u . , 

such as VRAM, to expend processor cycles to constantly Such P ara ?? bsm 15 . of J™**!" 

supply memory addresses when the information to be read M reso,w,s ***** ™ mbOD : ™ e f resolv f ' Jf ** '°f C 

KeVm consecutive memory locations. Since the present nec^ to bmld an image m the frame buffer.and for 

invention stores pixel infonnation in linear memory ""^ V"? £ ato t0 a monitor Preferred 

addresses, and relocates other non-display related embodiments use resohwrs operating in parallel to process 

information, the invention is able to utilize buna mode and A ^ f^ 1 ™* 

greaUy exceed prior art performance^ In a preferred „ ~^ZE^Z &%*A*tS£ 

embodiment, manipulations of pixels that require some „ . 1 u< " njlc . , . 1 .- , 

combination of read or write access to memory will be 1^^*^^**^^^^ 

collected into a variable length burst of reads, and if m * ^ *» P"?" 1 , 1DVentl0n 15 ^ to 

applicable, foUowed by a variable length burst of writes. achieve enomous graphics throughput 

A preferred pixel packing arrangement also reduces the 30 . ^f^^T^ 'JUSES mVCntlOD u , 
bus width needed from each resolver to the frame buffer "»8 *> *» S ^ RAM memo 2* ^ 1 me phyS 'f 
memory it controls. Abo, the invention is able to quickly ™» of ^ ™»* video memory Each location of a 
toggle me sense of which buffer is to be displayed, in just a fDRAM memory bank contams two sub-memory locations 
fcwclocks, all of the select buffer planes can be written for ^ e ™* 85 ^anks) It is possible, while one 
all of the pixels. Further, it is possible to quickly fill and 35 a**"* * being used, to simultaneously prepare the other 
flush the fast clear cache inside the resotvers (the fast clear fo [ ^ use Due to a latency involved with setting up the 
cache is more thoroughly discussed hereinbelow). And, to other sub-bank, alternating sub-banks are used when storing 
perform a screen refresh, it is not necessary to waste any J***? P™ 1 .^f^ J* 3 '*' fo ' 8 ^ nes ° f F™ 1 
clock cycles readmg unnecesary information from the btocks, the pixels will be- stored in alternating banks in 
frame buffer as all relevant information has already been 40 ***** «b* "nmgpment is what n^burst-m^possMe. 
placed in the display partitions of the video memory regions. Mo ™ * bemg used, by design of the SDRAMs, 
Related to tKsTpreferred embodiments are able to quickly another can be simultaneously prepared for future use. 
read and write just the planes (e.g. image and Z buffer) that For animation of three-dimensional objects, a preferred 
are involved in the rendering process. embodiment also supports performing fast clears. As with 
Optimizing rendering is crucial because one of the most 45 staA ^ «** fcr . 8 P» el « f 00 ^" sU> J^, ata * ute j 8 
complex graphics tasks is the simulation and animation of whether the pixel is going to be cleared (un-displayed) ,n the 
three dimensional objects. In current state of the art systems, ™* is, misinformation is in addition to the 
realistic representations of real-world object, such as a invention's storing RGB, alpha, Z buffer, stencil, and over- 
bowling ball, is performed by through logically breaking the b y information for a particular pixel. So as to speed up the 
object into many tiny triangles that may then be manipulated 50 animation process, preferred embodiments Sore clear infor- 
by a video frame buffer. This frame buffer then processes mation for many pixels in a single location. The grouping of 
(e.g. renders) the triangles for display upon a display screen. clear bits is designed to correspond with the video RAM 
-Vim sufficiently small triangles,- the rendered image may - blocks.. Consequently, when reach^. m^the vatoes- for the 
appear very realistic. In a preferred embodiment, the pack- P ac]s wltmn a block, the video frame buffer is able m a 
ing method for display pixels is such that usually at least one 55 single memory access, to read the clear information for an 
of triangles will fit within a video RAM region. This means cntire &°*P of pixels. This arrangement m effect caches the 
that the video frame buffer is able to render an entire triangle information for other pixels. When this is coupled with 
in burst-mode, resulting in a substantial performance memory accesses being performed in burst-mode, the pixel 
increase over the prior art. In addition, since a three- ckarin g xhcasc * faster than P" or 311 methods. Preferred 
dimensional object is usually created by many triangles that 60 embodiments of the invention will have a cache internal to 
are touching each other, the next triangle to be drawn is resofvers for maintaining fast clear bits, 
likely to be in another video RAM block. This allows for The present invention incorporates a highly integrated 
queuing a chain of pixel blocks to be burst-mode displayed. ASIC chip that provides hardware acceleration of graphics 
The present invention takes advantage of the SDRAM applications written for the OpeoGL graphics standard used 
burst-mode displaying by supplying the next memory 65 in windowing environments. 

address of a memory page to display upon the monitor while In preferred embodiments, the high-speed video frame 

the invention is currently reading and displaying the previ- buffer supports a wide variety of graphics users and appli- 
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cations. These features include scalable display architecture, In preferred embodiments, the primary function of the 

multiple display sizes, multiple frame buffer configurations, PCI DMA 102 device is to increase the speed at which 

variable amounts of texture memory, and high-performance. requests are received by the graphics engine. The PCI DMA 

The present invention will also accelerate OpenGL's high- 102 device has bus master and direct memory access (DMA) 

end visualization features to speed up operations such as 5 capabilities, allowing it to perform unattended transfers of 

texture mapping, alpha blending, and depth queuing via an large blocks of data from host memory to the Request FIFO, 

integrated Z4niffer. In addition, the texture processor 128 inside the combined 

<. . , . , - mm £ graphics engine/texture processor ASIC 124 may optionally 

In order to reduce the cost of the high-speed video frame *- . t , * _ ♦ * *u 

, _ . ;r tV « a perform pre-processing of pixels before they are sent to the 

buffer, the present invention will preferably allow floating io ' . ~ , rZ- * . , . 

. ' . . - ' m , J . ™ TT m & Dual Resolvers 110. This extra processing may be used to 

point processing to be performed by the system CPU. In . _^ *u i m • ♦ , A 

^ r , , ... J a add a texture or some other real-world image to a rendered 

more advanced embodiments of the invention, an optional _ ¥ c , ... . « . . . . , - 

t . , . . , , objecL In preferred embodiments, the texturing step would 

accelerator may be used instead to off-load work from the ^ fr mat to the resolvers U0 
system CPU. In addition, to reduce the size of the invention, 

the rendering ASICs will preferably be packaged in 625-pin t^ During operation of an embodiment of the present 

and 361-pin ball grid-type arrays, the frame buffer memory invention, the graphics engine 108 receives requests from a 

will be stored in high-density (preferably at least 16-Mbit) host processor via the PCI bus. It buffers the requests in its 

SDRAMs, the optional texture memory will be available on Request FIFO 118. The graphics engine 108 reads from the 

vertically-installed DIMMs, and preferred implementations Request FIFO 118, decodes the request, and then executes 

of the invention will be configured as single or dual-PCl card 20 the request. Requests are usually graphic primitives that are 

subsystems. vertex oriented (i.e. points, lines, and triangles), rectangular 

, , . .„ n ... , , fills, gets and puts of pixel data, blits, and control requests. 

Preferred embodiments will allow a high-level program- ZT^ & . . r . , . .TTT 71 c w 

• • . «. *u_ • *• . - . \ * c u The graphics engine's initial breakdown of the graphics 

rmng interface to the invention, driven by packets of graph- A . « » . ■ . - . . . , 

icsreqnests for vectors, triangles, fills, Wits, and others In 25 * to the span level, which is a horizontal sequence 

addition to these general features, the high-speed video of adjacent pixels. The graphics engme sends span request 

frame buffer will preferably support storing images in 24-bit over dedicated bus 120 to the Dual Resolvers 110. 

double-buffered image planes and accelerating OpenGL Before it sends the span request, the graphics engine 108 

operations such as stencil functions with, in preferred may texture the span. Span requests may include a fixed 

embodiments, 8 Stencil planes per pixel, ownership tests 30 color for each pixel in the request, or each pixel may have 

(masking), scissor tests (clipping of triangles and vectors), its own color. Some of the requests may return data. The 

alpha blending, and z-buffering. The invention, in preferred graphics engine 108 provides this data to the application by 

embodiments, will also support texturing features (if texture placing it back into the Request FIFO 118. In the preferred 

memory is installed on the card) such as texturing of lines embodiment of the present invention, there is only one 

and triangles through trilinear interpolation, 32 bits per texel 35 Request FIFO 118, and it operates in a half-duplex fashion 

(RGBA), storage of mipmaps in a variable-size texture (either in input mode or output mode), 

buffer, from 4 to 64 Megabytes, partial rnipmap loading, ^ _ 

1-texel borders around the texture images, multiple texture In preferred embodiments, the texture processor 128 

color modes, such as 4^omponent decals i and 1-compon^nt inside the integrated ASIC 124 writes and reads the texture 

(himinance) texture maps. Preferred embodiments will also 40 buffer 114. To prepare for texturing, software first loads a 

provide a FastOear function Cor rapidly clearing large family of images into texture memory. The family is called 

regions of the screen, support for the Display Data Channel a rnipmap. Amipmap includes an original image and sma l ler 

proposal from VESA for monitor identification, Dynamic versions of the same image. The smaller versions represent 

Contrast Mapping (DCM) per Image Context to map 16-bit the image as it would be seen at a greater distance from the 

frame buffer data to 8-bit display data in the back end video 45 eye. In preferred embodiments, a partial rnipmap set can be 

stream in real time, and generation of video timing for loaded into the invention's texture memory. A texture space 

industry-standard multiply synchronous monitors, as well as is treated as a collection of sub-blocks. Say a lKxlK space 

for specialized monitors such as Intcrgraph's Multiple Sync ^ tfled with 64x64 sub-blocks, and each sub-block can be 

monitors. replaced independently. When dealing with very large tex- 

Preferred embodiments of the invention will also support 50 tore sets, it's possible to load the neighboring sub-blocks 

various screen display modes such as Monoscopic, Dual- with the appropriate texture data from the large set rather 

Screen, Interlaced Stereo. t ^ _than use OpenGL borders. However, the state may arise 

Sequential for HeacVMounted Displays, VGA compatibility where two neighboring sub-blocks are from non-adjacent 

(as well as allowing concurrent residency within a computer ^ areas of the larger map. In this case, a preferred embodiment 

with 3^-party VGA cards). The invention will provide, in will not blend data from these two blocks into the rnipmap 

preferred embodiments, at least 2.6 million pixels in mono- collection. Texture memory 114 looks like frame buffer 116 

scopic single-screen mode, and, at least 13 million pixels memory to the graphics engine 108, and is loaded by normal 

per field for stereo modes. put and fill operations, or is read back by normal get 

Preferred embodiments will also provide features to 60 °P eratlons - 

enhance performance and visual integrity of both interlaced A preferred embodiment's blending function is different 

and frame-sequential stereo images. Such embodiments will over the prior art in that it reduces the maximum absolute 
allow programmable control over inhibiting the draws of error and obtains Qxff*anythmg=anything. This technique 

pixels to either even or odd scanhnes without checking the was implemented for the bilinear blends within a texture 

frame buffer's mask planes, as well as programmable control 65 map, the linear blend between texture mipmaps, and the final 

over drawing to both the even and odd fields of stereo blend between the fragment color and the texture color. A 

images through one request from software. summary of sorts of results is given below. 
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Maximum absolute error in 






output for N fractional hits, 


Mean of All 




ABS(thcorctical - actual) 


Possible Blend 


Method 


5-bit frac 8-bit frac 


Results 


Current 


4.419 1.439 


127.0 


Proposed 


4.065 0.910 


1275 



The theoretical results are the real values that would result 
given the raw integer fraction and operand inputs. 
Previously, the linear blending hardware implementation 
using "out=(l-f)*a+f*b" could be described by the follow- 
ing pseudo-code for 8 fractional blending bits. 



10 



15 



a£ are the two operands to blend between, 
f is the fraction of b desired in the blended result 
(1 - f) is fraction of a desired in the blended result- 
out is the result of the linear Mend. 



fa- 



((~f«l)|l)&Oxlr£ 
((f«l)|l)&0xlrr; 
-(fe*a+fb*b)»9; 



r 9 bit fractions, with •/ 
P rounding LSB added. *f 



20 



25 



A rounding bit has been added to each of the a and b 
operands. This reduces the maximum absolute error and 
yields QxiPOxff-Oxff. The output result of OxfT is only 
obtained in the previous blend method if both a and b are 
OxfiL This biases the results slightly towards 0, demonstrated 
by a mean blended result of 127.0. Under the invention's 
blending method, the mean blended result is 1275 (that is, 
255.0/2.0). In fact, the distribution of blended results and 
maximum absolute error are symmetric across the output 
range about 1275 for all possible inputs. The proposed 
blend in C code is the following. 



fe-(H«l)|l)&(btlfl& 

fc-((F«i)|i)&Qjafl& 

ra - ((a « 1) 1 1) & Oxlff; r Add rounding bit to •/ 
rb - ((b « 1) 1 1) & Oilff; r a and b as wett. */ 
out - (fe*ra + fb*rb) » 10; 



35 
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The hardware gate count and timing path delay impacts of 
this new blending method are minimal Logic synthesis is 
able to take advantage of the fact that the LSB of the a and 
b operands are always 1. In preferred embodiments, the 
hardware is implemented in a partial sum adder tree. 

In a preferred embodiment, software sends textured 
requests (triangle or vector requests containing texture 
coordinates). When the graphics engine 108 receives a 
textured request, it sends special span requests to the texture 
processor's input FIFO 130. The texture processor 128 
textures the pixels within the span, and places the resulting 
values in its output FIFO 132. Hie graphics engine 108 
transfers these altered spans to the Dual Resolver chips 110. 

Preferably the high-speed video frame buffer is composed 
of either two or four Dual Resolver chips 110. Each Dual 
Resolver is built from three main modules: two resolver 
modules 134, 136 and one Screen Refresh module 122. A 
resolver module 134, 136 is responsible for translating span 
requests into manipulations of the frame buffer 114, while 
the Screen Refresh module 122 is responsible for sending 
pixel data to the RAM DAC 112. In addition to reading and 



50 
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writing the frame buffer, in preferred embodiments the 
resolvers 134, 136 also perform masking, alpha tests, Z 
buffering, stencil tests, frame buffer merges (read/modify/ 
write operations such as alpha blending and logical 
operations), and double-buffering. A resolver 134, 136 
receives requests from the graphics engine 108 in its input 
FIFO 138, 140, and parses the request and translates it into 
a series of frame buffer reads and/or writes. After r^rforming 
the appropriate operations on the pixel data, the resolver 
then determines whether or not to write the pixel. Further, if 
it is going to write a pixel, the resolver determines which 
planes it will write. Each resolver is only responsible for a 
subset of the pixels on the screen. Therefore, each resolver 
only reads and writes the portion of the frame buffer that it 
"owns". 

The Screen Refresh module has a pixel FIFO 142. This 
FIFO supplies pixels (digital RGB phis Image Context) to 
the RAM DAC 112 for display on a monitor 144. To keep 
the FIFO from emptying, the Screen Refresh 122 module 
requests pixel data from the two resolver modules 138, 140 
within the same Dual Resolver chip 110, which in turn read 
the frame buffer 116. As long as the Screen Refresh module 
122 requests pixel data, both of the resolver modules 138, 
140 continue to supply data. After the pixel FIFO 142 has 
temporarily stored enough pixels, the Screen Refresh mod- 
ule stops the requests, and the resolvers 138, 140 may return 
to other operations. 

The Screen Refresh module 122 also interprets the color 
of a pixel. Since a pixel may consist of double-buffered 
image planes, double-buffered overlay planes, double- 
buffered fast clear planes, and double-buffered image con- 
text planes, the Screen Refresh Module must determine 
which planes drive a pixel's color. After it determines the 
pixel's color, the Screen Refresh module may also map the 
pixel through the DCM logic 144. This special-purpose logic 
maps 16-bit pixel data (stored in the red and green planes) 
into 8-bit data. When this feature is enabled, the Screen 
Refresh module replicates the 8-bit result onto its red, green, 
and blue outputs. 

In a preferred embodiment, the frame buffer 116 has a full 
set of planes for each pixel. Each plane for each pixel 
represents information being tracked for that pixel. Planes 
are logically bundled into sets. The three most common 
plane sets are red, green, and blue, representing the pixel's 
display color. In the present invention there are over 100 
planes of information per pixel. One such plane set is 
Overlay. These planes, if transparent, allow the values of the 
Red, Green, and Blue planes (hereinafter the Image planes) 
to show. If the Overlay planes are opaque, however, one 
viewing the display would see the Overlay values replicated 
onto all three RAM DAC 112 channels. In the present 
implementation of tbe invention^ a special case exists for one 
of the configurations when the Overlay is only 1-bit (single- 
plane) double-buffered (hereinafter the "Highlight" case). In 
the Highlight case, a static 24-bit register value is displayed 
when Highlight is opaque. 

There are image planes that are double-buffered 24-bit 
planes. All 24 planes-represent 24-bit RGB color. It possible 
to configure the invention to represent 16-bit image data 
mapped to 8-bit data through dynamic contrast mapping, or 
to assert pseudo-color mode to assign an image arbitrary 
colors. 

Of the over 100 planes per pixel, only some of them are 
ever visible upon a display monitor. Of these, there are 
Image Context planes that are Double-buffered 4-bit planes. 
These planes select dynamic contrast mapping lookup table 
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entries in the screen refresh modules, and also choose the geNibble6 (NIB6; the nibbles in memory stored with both 

appropriate set of video lookup tables in the RAM DAC 112. buffers of R, G, B, and IC) serves as either four-bit, 

There are also Fast Clear planes that are Double-buffered double-buffered Overlay or eight-bit Alpha. In this 

single-bit planes. These planes, if set, indicate that the frame embodiment, "Opaque" reflects the state of a layer of planes 

buffercontentsfortlie pixel are s^ 5 that can either obscure the underlying image (when opaque), 

register are newer ^re are also Meet ^er to«e planes of ^ me ^ to ^ (when 

that are Single-buffered single-bit planes. These planes A . . . . 4 . . , , 

indicate which buffer of Image, Image Context, and Fast > ' ^ ^ T£ ° b ™ U ? Ca f' th * ^ 

Qear is the front buffer. Tnere are also Overlay planes that * Oveday. When the Overlay value matches the 

are Double-buffered single-bit, 4-bit, or 8-bit planes, transparent Overlay value, the Opaque bit is clear. For all 

depending on operation mode of the invention (controlled by other Overlay values, Opaque is set. 

software). These planes are displayed if their value is Preferred embodiments also generate Video timing sig- 

"opaque". Otherwise, the image layer displays. There are nals. These signals are generated by a programmable module 

also Select Buffer Overlay planes that are Single-buffered inside the graphics engine, referred herein as the Video Sync 

single-bit planes. These planes indicate which buffer of 15 Generator (VSG) 146. The VSG generates horizontal and 

Overlay is the front buffer. vertical riming markers, in addition to synchronizing the 

The planes that are not actually displayed upon a monitor, Screen Refresh modules 122 with the video stream. The 

but are only used in the generation of images to eventually RAM DAC 112 receives the sync signals that the VSG 

be displayed, are referred herein as the construction planes. generates, and sends them through its pixel pipeline along 

Of these, there are Alpha planes, which are single-buffered ^ with the pixel data. The RAM DAC then drives the moni- 

8-bit planes. The Resolver 134, 136 performs alpha blending tor's sync and analog RGB signals. 

with these planes. There are also Stencil planes that are .„ m 

Single-buffered 6-bit or 8-bit planes. These planes hold the 111 prcfcned ^embodiments will also detect the 
value of the stencil buffer. In preferred embodiments, they P 1 ** 1 ** of feed-through display signals 148 that may or 
support OpenGLstencfl operations. There are also Zplanes « may not undergo processing by the invention before being 
thatare Single-buffered 24-bit or 32-bit planes. These planes * *spUycd upon a monitor. Such signals coiddbethemputof 
hold the value of the Z buffer. In preferred embodiments, ^ ^formation that is tcvbe directly duqphyed upon the 
they support OpenGL depth testing. There are also Mask momtor > as well as feed-through VGA signals, 
planes that are Single-buffered 2-bit or 4-bit planes. Mask The present invention provides pixel-mode double- 
planes are used in conjunction with reading and writing ^ buffering, wherein a single bit per pixel determines which 
image data. In preferred embodiments, enabled mask planes image-related planes are to be displayed. Similarly, an 
can inhibit writes on a per-pixel basis. additional bit per pixel determines which overlay planes are 

Certain planes are logically grouped together. For to be displayed. In preferred embodiments, the high-speed 

example, in preferred embodiments, writes to the frame video frame buffer supports four frame buffer combinations, 

buffer are made to a "visual", which is a set of related planes. 35 These configurations are derived from the possible combi- 

For example, in the present invention, visual 2 is the Image nations of pixel depth (number of planes per pixel), and the 

visual. It primarily accesses the image (RGBA) planes, but number of Resolvers 110 installed. Preferred embodiments 

it can also affect Z, Stencil, and Image Context planes. will support at least two pixel depth options: 102 planes per 

Preferably, only planes included in the visual are a^ected by pixel and 128 planes per pixel. The fonowing table shows 

the operation. The Image Context planes are only included 40 the available plane sets in the 128 PPP (planes per pixel) 
as implied data: their value is sourced by a static register in 
the graphics engine. Enable and disable the writing of 
implied data separately via plane enables. 

When an image is to be displayed, display information is 
given to the RAM DAC 112 for conversion into signals 45 
compatible with viewing monitors. In preferred embodi- 
ments of the present invention, the RAM DAC has three 
primary functions: provide the palette RAM for mapping 
incoming RGB to input data for the digital to analog 
converter (DAC), provide a 64x64 hardware cursor, and 50 
convert digital RGB to analog RGB. In preferred 
embodiments, four sets of video lookup tables are available 
in the RAM DAC. The Image Context values sent with each * ' *- 
pixel determine which lookup table maps the pixel. The 
lookup tables output three 10-bit values (one value each for 55 
red, green, and blue), which are sent to the DAC. 10-bit 

values allow more flexible storage of gamma correction Preferably, choosing between supported pixel depths and 

curves than 8-bit values. Recall that the particular bit widths modes is through setting a bit within a special purpose 

are dependent on the RAM architecture chosen, which in register contained in each Resolver. 

present embodiments, is SDRAMs. 60 Independent of the pixel depth, a preferred embodiment 

In preferred embodirnents of the 128 PPP configuration, will have either two or four Dual Resolver 110 devices 

the concept of Highlight and Overlay planes will be imple- present. Each Dual Resolver 110 will preferably control its 

merited through visuals 0 or 1. Preferred embodiments own buffer memory 116. In one embodiment, each buffer 

intend to use visual 1 exclusively to access Overlay. For the 116 is four 1Mx16 SDRAM devices, so that the combina- 

102 PPP embodiment of the invention, supporting Highlight 65 tions of preferred pixel depths and number of Dual Resolver 

and Overlay is more complex. In this embodiment, one devices creates four preferred frame buffer (FB) embodi- 

double -buffered plane serves as an Opaque plane, and Ima- merits: 



Plane Set 


Buffering 


Planes 
Per Pixel 


Total Planes 
Per Pixel 


Image 


double 


24 


48 


Image VLT Context 


double 


4 


8 


Fast Clear 


double 


1 


2 


Overlay 


double 


8 


16 


Mask 


ringfc 


4 


4 


Z Buffer 


single 


32 


32 


Alpha 


single 


8 


8 




single..^ 






Select Bttflcc tnut^jc 


single 


1 


1 


Select Buffer Overlay 


Bangle 


1 


1 
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of the system can be changed via the requests that load 
registers and data tables. The context of the system can be 
observed via the "Read" requests. Control requests exist for 
miscellaneous purposes. These requests include the NoOp, 
5 PNoOp, Interlock, SetUserlD, and Wait commands. 

The Request FIFO may be half-duplex, and if so, after 
software issues a request that will return data, it may not 
accept further requests until the returned data has been 
emptied. If software does not obey this constraint, then a 
10 "FIFO Duplex Error*' will result. Requests are further 
, , . divided into protected and not protected requests. Protected 

In preferred embedments, the present invention may be will not be executed unless they were written to the 

utihzed in a stereo mode to^ow^reo viewi^ of mage^ ^ mQ Not ^ will execute from 

When vrewing in stereo » rnode, the number oi pixels avail- P ^ ^ fc £ mapP ed 

able for each eye is half the total number of pixels. . , ^ J Wr?r* - -a a 

With respect to textures in preferred emoluments of the 15 mto ^^^f^ ^ * coos^ a 

high^peedvideo frame buffer, the invention will store the protected FIFO, and hence can execute protected requests. It 
texture buffer 114 in high^density memory DIMMs. The * ^tended, in preferred embodiments, that for an apphca- 
presence of DIMMs is preferably optional, allowing the user tion to "direct access" the present invention's hardware, the 
to either install no DIMMs, or one pair of DIMMs. Software application will be able to write to the not protected FIFO, 
automatically detects the presence of texture memory. If no 20 but not the protected or sync FIFOs. Context switching is 
DIMMs are present, then the graphics engine renders tex- supported at any point in any non-protected request written 
tared requests to the frame buffer untextured. In addition, the to the non-protected FIFO. Protected requests or requests 
invention's texturing subsystem should support a variety of written to a protected FIFO are not interruptible. 
DIMMs, including Synchronous DRAMs or Synchronous Regarding memory usage, there are several differences 
GRAMs. The texture processor should also support many 25 between the way memory is used in prior art products such 
densities of memory chips, including 256Kxl6, 256Kx32, as the Edge III, and the way memory is used in the invention. 
1Mx16, and Z2Mx8 devices. These differences arise from the prior art's frame buffer 

With respect to monitors, the high-speed video frame being built from Video RAMs (VRAMs), while the present 
buffer supports various monitor configurations, dependent invention's frame buffer is built from Synchronous DRAMs . 
upon the amount of memory installed upon the invention, 30 (SDRAMs). The primary reason for the choice of SDRAMs 
and the properties of the monitor. A subtle point regarding in the invention is cost SDRAMs cost less per bit than 
monitors stems from the high-speed video frame buffer VRAMs, while they are available in much higher densities 
being organized as rectangular regions of pixels, or pixel than VRAMs. Their higher densities allow for more compact 
blocks. In a preferred embodiment, one page (row) of packaging. For example, the 2 Mpixel frame buffer is built 
memory in the SDRAMs corresponds to one pixel block. By 35 from 136 VRAMs in Edge III, but only 16 SDRAMs in the 
this architecture, the high-speed video frame buffer only invention. As noted hereinabove, an alternate type of RAM 
supports an integer number of pixel blocks in the x dimen- may be utilized instead of SDRAMs, so long as similar 
sion. Therefore, if a resolution to be supported is not functionality is achieved. 

divisible by the pixel block width, then some pixels off the The physical differences between VRAMs and SDRAMs 

right edge of the display are held in off-screen memory. In 40 produced marked differences between the frame buffer 
that situation, the high-speed video frame buffer supports aremtectures of preferred embodiments over prior art 
fewer chsplayable pixels than technically possible according Hreigre. One major difference between the devices is that 
to available video memory and monitor characteristics. VRAMs are dual-ported, while SDRAMs are single-ported. 

In addition to video memory constraints, there may also The VRAM's additional port is a serial shift register that 
be restrictions on pixel display characteristics due to the 45 provides a path from the frame buffer to the display, while 
high-speed video frame buffer's frame buffer logic. That is, only minimally impacting bandwidth between the memory 
the module in the high-speed video frame buffer system that controller and the frame buffer. 

generates timing signals for the video display places further Another difference between the two device types is the 
restrictions on the monitor configurations that are supported. relative impact of page crossings (discussed hereinabove). A 
Presently, the maximum vertical period is IK lines per field so characteristic of both types of RAM devices is that they bold 
in interlaced stereo, 2K lines per field in frame-sequential a matrix of memory. Each row in the matrix is referred to as 
stereo, and 2K lines in moDonscopic mode. "Maximum a page of memory. Accesses to locations within a page can 
vertical period" includes the' displayed lines, plus the blank occur very quickly. When a location to- ^ 
time. Additionally, the minimum horizontal period is 64 outside the page that is currently being accessed, then the 
pixels. Also, the back end video logic restricts the maximum 55 memory controller must cross the page boundary. A page 
frequency of the pixel clock to approximately 160 MHZ for crossing involves closing the open page, precharging, and 
the 1 .0 MP and 1 3 MP frame buffers and approximately 220 then opening the new page. Page crossings in SDRAMs are 
MHZ for the 2.0 MP and 2.6 MP frame buffers. relatively more expensive than in VRAMs. The actual time 

The present invention is a request-driven graphics system. to perform a page crossing is about equal for the two 
Requests are used for operations such as loading registers, 60 devices, but the memory interface for a SDRAM may 
clearing a window, and drawing triangles. There are three provide new data to the controller synchronously at speeds 
types of requests: graphics requests, context requests, and of around 100 MHZ, while VRAMs provide new data to the 
control requests. Drawing is accomplished via the DrawVec. controller asynchronously from 20 to 30 MHZ. 
DrawClipVec, and DrawTri requests. Sending graphics data These architecture differences between VRAMs and 
to the invention is accomplished via fills and puts, and 65 SDRAMs produced allowed several new memory configu- 
graphics data is retrieved via get requests. Data is moved rations providing superior performance to that of the prior 
within the system with the blit request (Bitfilit). The context art Such new configurations include a using a packed pixel 
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formal, rather than storing a whole pixel in one word of triangle, vector, and PutBlock requests may be very fast 
memory, and mapping pixels to the display in a pixel block since these operations typically involve a small number of 
organization, versus a linear mapping scheme. In addition, plane sets. In addition, with single-port SDRAMs, time 
the prior art does not utilize SDRAMs for texture memory. spent reading the frame buffer to satisfy screen refresh 
Often regular asynchronous DRAMs are used to contain the 5 requirements subtracts from time available for rendering 
texture memory. In the present invention, to ensure the speed m to the frame buffer. With a preferred embodiment's pixel 
of the texturing system is comparable with the increased storage method, plane sets that affect the display are physi- 
speed of the frame buffer subsystem, preferred embodiments cally sep^ed from plane sets that do not affect the display, 
told texture memory mSDI^ Therefore, when the invention's Resolver reads the frame 

however, page crossings are relatively expensive and to „ buffer to smcn rc&esh requirements, it need not 

maintain nigh performance, texels are arranged mto texel . . - 

blocks (anaK pixel blocks). ™? c P^f essor ^ unnecessary information^ 

Note thatle widest SDRAM currently available is 16 k Regarding pixel mapping m moving information from 
bits. This width, coupled with the SDRAM's high density £ **** ^ er * ****** *™ «* deS1 ^ ™ 
only allows a 32-bit bus between each Resolver and its VRAMs with a built-in serial shift register, since a linear 
frame buffer memory without wasting memory. If a higher 15 address map is convenient to implements a linear address 
density RAM was utilized, however, higher bandwidth and ma P> a memory address of zero accesses the upper left pixel 
wider buses could be utilized. on the screen. Increasing memory addresses correspond to 

Regarding pixel storage, in the prior art a Revolver's wide screen locations further to the right until the right edge of the 
data bus provides simultaneous read access to all of a pixel's screen is reached. Further increasing the memory address by 
single-buffered planes and one set of its double-buffered 20 one corresponds to a location on the left edge of the next 
planes. Therefore, in one memory cycle, the prior art scan line. 

Resolver may typically access all of the information relevant In such prior art graphics processors, mapping screen 
to a pixel. In a preferred embodiment of the present addresses to memory locations is linear oriented. For 
invention, each Resolver within a Dual Resolver package example, a page of VRAM memory may hold 512 locations, 
may only access 32 bits of data per cycle (due to current 25 If using an Edge IU product, all four Resolvers would access 
SDRAM width limitations discussed hereinabove). Since a two sets of VRAMs via one data bus, one for each adjacent 
pixel in a high-performance graphics system is usually pixel on the display. Therefore, one page of VRAM spans 
represented by over 100 planes, each Resolver may only 4096 (512*2*4) pixels. The first page of memory accessible 
access a fraction of a pixel at one time, so the pixel data must by the combined Resolvers spans from pixel zero on the 
be stored differently in the invention than used in the prior 30 screen to pixel 4095. The second page accesses pixels 4096 
art In preferred embodiments, some words of memory hold to 8191, and so on. If the monitor displays 1600 pixels in the 
a partial pixel, while other words of memory hold a plane set x-dimension, then page zero spans the first two lines of the 
for many pixels. This format is called a Packed Pixel format display, and 896 pixels on the third line of the display. Page 
in the invention. one then spans from pixel 896 on the third line to pixel 191 

FIG. 2 shows a comparison between the present invention 35 on the sixth line, and so on. 
and how data is stored in the 2.0 Mpixel Frame Buffer of a In contrast, a preferred embodiment uses a pixel-block 
prior art Edge III graphics processor. In a given word in arrangement to map addresses to physical screen coordi- 
memory, the Resolver may access one of several possible nates. Preferably, pixel blocks are 8 rows tall, and their 
plane set combinations. For the invention, the contents are: width is determined by the number of Dual Resolver chips 
For BufferO, Image (Red, Green, Blue) and Image VLT 40 installed, and the pixel depth chosen. In preferred 
Context, Alpha[3:0] for a single pixel, Overlay for 4 pixels, embodiments, several configurations are available, and oth- 
and FastQear for 32 pixels 202. For Bufferl, Image (Red, ers could easily be implemented. For a 2 Dual Resolver 
Green, Blue) and Image VLT Context, AIpha[7:4] for a configuration at 128 PPP, the pixel block width is 32 pixels, 
single pixel, Overlay for 4 pixels, and FastQear for 32 pixels For a 2 Dual Resolver configuration at 102 PPP, pixel block 
204. There are also several single buffered planes: Z buffer 45 width is 40 pixels. For a 4 Dual Resolver configuration at 
for a single pixel; Stencil for 4 pixels; Mask for 8 pixels; 128 PPP, pixel block width is 64 pixels. And, for a 4 Dual 
SelectBufferlmage for 32 pixels; and SelectBufferOverlay Resolver configuration at 102 PPP, pixel block width is 80 
for 32 pixels 206. pixels. 

In contrast, in the prior art graphics processor 208, FIG. 3 illustrates a pixel-block mapping for a preferred 
memory Address 1 holds these planes for pixel 8 of pixel 50 embodiment of the invention's pixel-block mapping, and its 
block 0; Memory Address 83 holds these planes for pixel 8 assignment of pixels to Resolvers for the 128 PPP embodi- 
of pixel block 0; Memory Address 64 holds these planes for ment of the invention. As shown, a Resolver is assigned 
pixels 0, 8, 16, and 24 of pixel block 0; Memory Address 146^ * every eighth vertical pixel stripe across the screen. (For a * : 
holds these planes for pixels 0, 8, 16, and 24 of pixel block 102 PPP embodiment, each Resolver would be assigned 
0; Memory Address 80 holds these planes for pixels 0, 8, 55 every fourth pixel stripe.) 

16, ... , and 248 of pixel block 0; Memory Address 162 As discussed hereinabove, page crossings are relatively 
holds these planes for pixels 0, 8, 16, ... , and 248 of pixel more expensive for SDRAMs than for VRAMs. The pixel- 
block 0; Memory Address 165 holds these planes for pixel block mapping is configures so as to minimize page cross- 
8 of pixel block O, Memory Address 228 holds these planes ings during triangle draws and during surface rendering. The 
for pixels 0, 8, 16, and 24 of pixel block 0; Memory Address 60 rationale is that triangles, and vectors to a lesser degree, are 
244 holds these planes for pixels 0, 8, 16, 24, 32, 40, 48, and more typically drawn into a rectangular region of the screen, 
56 of pixel block 0; Memory Address 252 holds these planes as opposed to being drawn in a thin horizontal screen slice 
for pixels 0, 8, 16, ... , and 248 of pixel block 0; and that prior-art linear mapping produces. Each pixel block is 
Memory Address 254 holds these planes for pixels 0, 8, wide enough so that page crossings are also reduced during 
16, ... , and 248 of pixel block 0. 65 block-oriented requests, such as puts and gets. Note, 

When a preferred embodiment's memory is configured as however, that Blits will likely cause page crossings when a 
indicated for the invention, draws mto the frame buffer for switch is made from a the read to the write portion of the blit. 
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To further reduce page crossings, the invention exploits 
another feature of SDRAMs (or other memory with similar 
features): they are dual-bank. In dual-bank SDRAMs, two 
different pages in memory may be open at the same time-one 
in each bank (referenced here inbe low as BankA and 
BankB). While one of the banks is being closed and 
reopened, say BankA, a page in BankB may be accessed. 
This effectively hides most if not all of the page crossings in 
BankA. 

For example, in the 128 PPP embodiment, each Resolver 
is assigned 64 pixels in one pixel block 302 (eight rows of 
eight pixels). The FCO, FC1, HLO and HL1 (102 PPP case), 
SB1, SBO (128 PPP case), and SBH (102 PPP case) plane 
sets are each packed such that only two memory words are 
required to store each plane set 

When the monitor is in non-interlaced mode, the Screen- 
Refresh Module (SRM) in the Dual Resolver chip must 
provide a pixel stream that uses every line of pixels in the 
frame buffer. To supply this stream, the SRM receives a 
complete word of SBI (for example) from the memory 
controller. It supplies one row of pixels immediately to 
satisfy the display, and temporarily stores the three other 
rows. On the succeeding scanlines, all of this data is pro- 
vided to the display. When the monitor is placed in inter- 
laced mode, however, the SRM only needs to supply every 
other line of pixels to the display during one frame. The next 
frame consumes the remaining lines. In this case, if the 
memory storage were the same as in the non-interlaced 
mode, the SRM would receive a memory word that only 
contained two useful rows of pixels. Therefore, memory 
would have to be read more often to supply the pixel stream. 
To enhance the efficiency of the frame buffer's bandwidth, 
the pixel rows are stored differently in interlaced mode by 
the Resolver 

The 102 PPP case is very similar to this example, with the 
exception that each resolver is responsible for more pixels 
per pixel block (80, or 8 rows of 10 pixels), which means 
mat 2Vt words in memory store the packed pixels. The two 
storage modes are as shown below for the 4-Resolver case. 

Note that the mapping from a pixel location in one frame 
buffer to a pixel location in the other frame buffer just 
requires that the pixel row number be modified such that 
noninterlaced row numbers 0 through 7 map to interlaced 
row numbers 0, 2, 4, 6, 1, 3, 5, and 7. Tins mapping is 
accomplished by a left-circular rotate of the pixel row 
number. This mapping is driven by the packed pixel plane 
sets, but it is also applied to all the other plane sets for a 
consistently-mapped scheme. 

FIG. 4a and FIG. 46 show a standard mapping versus a 
preferred embodiment's checkerboard mapping. Referring 
to FIG. 4a, assume that a scan-line (segment CD) is part of 
a PutBlock3Z In the prior art, a Resolver might first open 
Page n in BankA, draw pixels from left to right until the right 
side of the pixel block is reached, close Page n in BankA and 
open Page n in BankB, draw pixels from left to right until 
the right side of the pixel block is reached, close Page n in 
BankB and open Page n+1 in BankA, and then draw pixels 
from left to right until point D is reached. However, a faster 
way to write the scanHne into memory is to hide the page 
crossings in the drawing time, or open Page n in BankA, and 
while drawing pixels in Page n, BankA, open Page n, 
BankB, and while drawing pixels in Page u, BankB, close 
Page n, BankA and open Page n+1 in BankA, and then draw 
pixels in Page n+1 in BankA until point D is reached. 

FIG. 4b corresponds to a preferred embodiment's inten- 
tional checkerboarding of frame buffer pixel blocks. Both 
horizontally and vertically, pixels from opposite banks in 
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memory are placed into adjacent pixel blocks on the screen, 
so that if an even number of pixel blocks fill the screen in the 
x --dimension, then all pixel blocks in a vertical line would 
fall within different pages in the SDRAM bank. When the 

5 number of horizontal pixel blocks is odd, however, an 
imaginary line drawn in either the horizontal or vertical 
directions passes through alternating SDRAM banks. In this 
second case, the pixel blocks naturally form a checkerboard 
pattern (ie. two pages within the same bank are never 
adjacent to each other). By intentionally addressing memory 
differently, the Resolvers always access memory banks in a 
checkerboarded fashion. Checkerboarded mapping speeds 
up the rendering of triangle meshes in simulation. Also note 
that as with frame buffer memory, texture memory is simi- 
larly checkerboarded. 

15 Another important feature of a preferred embodiment's 
memory arrangement is that all of the planes for a particular 
pixel are stored on the same physical page (unique row 
address) in SDRAM memory. This arrangement enables the 
Resolver to access all plane sets of all pixels on the page 

20 without page crossing penalties once a page is opened. If the 
screen has fewer pixels than the frame buffer, then off-screen 
memory locations are available. For the driver to access that 
memory, it must address the locations as if they were 
physically below the screen, consistent with the address 

25 mapping of the architecture. 

In the prior art, as discussed hereinabove, the address 
mapping is linear. Therefore, any off screen pixels are 
mapped in the same linear fashion as the rest of memory. For 
example, in the Edge III graphics processor, the frame buffer 

30 always holds 2 Mpixels. Thus, the amount of off-screen 
memory varies with the monitor and resolution chosen. A 
preferred embodiment has support for such off screen 
memory, but as with the prior art, the amount of off screen 
memory varies with according to the monitor, resolution, 

35 and frame buffer configuration. Unlike the prior art, 
however, the off screen memory is grouped into pixel 
blocks. Consequently, it is possible that even though there is 
many off screen pixels, there may be no full rows of pixels. 
An advantage to the present invention's utilization of the 

40 dual-pages is that the apparent page size of the SDRAMs is 
increased, while dynamically altering the physical dimen- 
sions of the pixel block. As a result, objects that are large 
enough to span multiple pixel blocks may be drawn more 
quickly. Another advantage is that during reads of memory 

45 to satisfy screen refresh requirements, it becomes possible to 
hide page crossings. While data is being read from one 
SDRAM bank, the page that maps to the next pixel block to 
be hit by the raster scan is opened. In preferred 
embodiments, this page is always in the opposite SDRAM 

50 bank. And, while reading from the now-open bank, the 
previous bank is closed. 

In addition to the access methods described hereinabove, 
■ - preferred embodiments of the invention also support inter- *- 
laced mode. In this configuration, pixels are stored differ- 

55 entry in the frame buffer when in interlaced mode than in 
non-interlaced mode. Interlaced mode is enabled by setting 
a bit in a control register for the invention. Setting this bit 
causes some plane sets for some pixels to be stored differ- 
ently. 

60 In a preferred embodiment, the logic for the texture 
processor is included in the graphics engine. Therefore, if 
texture memory is available, texturing is available. 
SDRAMs are used for texture memory instead of the 
DRAMs used by the prior art SDRAMs provide faster 

65 texturing performance. 

FIG. 5, FIG. 6 and FIG. 7, show hardware block diagrams 
for three general configurations supported by the texture 
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processor. Preferred embodiments of the invention support reads, regardless of whether the access is in response to a 

several such memory configurations. Some general features read or texturing request, the texture processor reads eight 

of the subsystem are apparent. First, the texture processor texels in two read cycles. Normally, this occurs on two 

accesses texture memory via four, independent, 32-bit data consecutive clock cycles. First clock — 1 texel from TME, 

and address buses. And, memory is still split into two logical 5 SetO; 1 texel from TME, Setl; 1 texel from TMO, SetO, and 

(and physical) banks: "Texture Memory Even", (TME) and 1 texel from TMO, Setl. Second clock— repeat the reads 

"Texture Memory Odd" (TMO). And, TME is subdivided from the first clock after changing the texel addresses. The 

into two sets of SDRAMs: SetO and Setl. TMO is subdi- only exception to this pattern occurs on some cases of 

vided similarly. texturing with borders. Due to conflicting requests among 

For each configuration of texture memory supported in 10 SDRAM pages, there is a pause between the first and second 

the invention, the size and number of banks varies depend- reads while the current pages are closed and new pages are 

ing on the organization and quantity of SDRAMs that hold opened. 

the texture memory. Each one of these organizations pre- As with the frame buffer memory, one page of SDRAM 

sents a different memory map to applications. The common memory represents one texel block. Since the SDRAMs are 

features are that the maximum U dimension is fixed at 2K, 15 dual-bank, two texel blocks may be open at one time, so the 

the bank bit (B) is stored in an internal register and toggles texture processor may often access texels that straddle two 

access between TME and TMO banks, and a mipmap with texel blocks without any page crossing penalties. The pixel 

the next-lower level of detail from the current mipmap is blocks are checkerboarded as in the frame buffer memory, 

stored in the opposite bank. further enabling the texture processor to access texels from 

In preferred embodiments of the invention, the limitin g 20 adjacent texel blocks without opening and closing pages in 

factor for maps with borders is that the border information middle of the access. In most situations, no matter which 

must be stored in the same bank (Texture Memory Even or direction the texture is traversed, all texels required may be 

Texture Memory Odd) as the image data. accessed without interruption from a page crossing. 

In the prior art, it was assumed that all lower resolution For double-buffered rendering, a preferred embodiment 

mipmaps are stored However, in the invention it is possible 25 implements a WRITE _ VISIBLE and READ_VTSIBLE 

to load a truncated mipmap set as controlled through the register bits to indicate when a given visual is double- 

LOD__CLAMP internal register. LOD_CLAMP may rep- buffered. When WRITE_VISIBLE is set, fast clear planes 

resent the actual number of mipmaps; if clear, it is assumed for the associated pixel will be ignored, and not be read nor 

no mipmaps exist for a given texture. written when the pixel is accessed. When READ_VTSIBLE 

In general, however, the LOD_CLAMP field determines 30 is set for a read, then the Resolver will determine the clear 
the final level of detail (LOD) to use when nup mapping, status of the pixel from the VISIBLE FC plane. (Note that 
Normally, this is set to the minimum of U_SIZE (size of the video display driver should be aware of this 
selected texture map in U direction) and V_SIZE (size of interpretation, since the visible buffer may not own the 
selected texture map in V .direction). For example, a pre- construction plane sets.) When these bits are set, a Resolver 
ferred embodiment modifies OpenGL borders stores by 35 must first read the appropriate SelectBufferlmage (SBI) or 
storing the actual texture-data centered in the next larger SelectBufferOverlay (SBO) bit from the frame buffer to 
map size. If just min(U_SIZE, Y_SIZE) were used, the determine which buffer is visible. Also, the ScreenRefiesh 
texture processor would go one level of detail beyond where module in the Dual Resolver must read these bits to deter- 
tbe method still remains correct. Also, as an alternate way to mine which buffer it should display as it updates the screen, 
do borders, a lKxlK space may be tiled with 64x64 40 In a preferred embodiment, double-buffering utilizes two 
sunblocks. Mipmap sets only exist for the 64x64 blocks. planes per pixel to control the displayed buffer. The first is 
Normally, the maximum LOD would be 11, but in this case the SelectBufferlmage (SBI) plane for the Image planes 
the maximum LOD should be 7. By default, its value is set (Red, Green, Blue, and Image VLT Context) and the Select- 
to 0x0, giving the same behavior as LLSIZE and V SIZE BufferOverlay (SBO) plane for the Overlay planes. If SBI is 
after warm reset 45 set, the pixel's bufferl Image planes are visible, and if SBI 

The mapping of SDRAM addresses to texel (UV) space is is cleared, the pixel's bufferO Image planes are visible, 

carefully constructed to allow high texturing performance. Likewise, if SBO is set, the pixel's bufferl Overlay planes 

For a pixel to be textured, eight memory locations (eight affect the display, and bufferO if SBO is clear, 

texels) must be read: four texels that surround a specific These operations would be faster if the Resolver already 

(u,v) coordinate at one level of detail, and the four texels that 50 knew which buffer was visible to avoid first performing a 
surround that same coordinate at the next lower level of read. Towards this end, a preferred embodiment supports 

detail. Displayed-Buffer Detection (DBD). The usefulness of this 

FIG. 8 snows how texels are mapped from memory to UV feature relies on the assumption that fox well-behaved cases, ' 
space. Hie upper right-hand comer of the figure indicates all of the pixels on the screen are displaying bufferO. This 
that TME is divided into texel blocks. The lower left-hand 55 condition will be true before any application begins double- 
corner of the figure shows that each texel block is divided buffering; while one application is double-buffering, and it is 
into texels. Each texel within a texel block has been assigned currently displaying bufferO: while many applications are 
one of four symbols. When the texture processor reads a double-buffering, and all of them are currently displaying 
group of four texels at one level of mipmap detail, it reads bufferO; or after applications have stopped double-buffering, 
one texel of each symbol type. The rows of texels rep re- 60 and the device driver cleans up all SBI and SBO bits to point 
seated by the circles and squares are read from SetO of TME, to bufferO. 

while the rows of texels represented by the triangles and DBD determination occurs as the Screen Re fresh module 

crosses are read from Setl of TME (refer to the hardware in the Dual Resolver must determine which buffer is dis- 

block diagram in FIG. 5). played for every pixel on the screen as it is filling its pixel 

FIG. 8, FIG. 9, and FIG. 10 show how texels are mapped 65 FIFO. If an entire screen is updated from bufferO, the 

for three general hardware configuration, although other ScreenRefresh module may set a flag to the Resolve rs, 

configurations are possible. In all cases of texture memory signaling that READ_VISIBLE actually means "read from 
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buffeiC, etc. If the Resolver modules interacting with the first flush the cache (while the page is still open) to make 
ScreenRefresh module detect that one of the SBI or SBO bits room for the FC bits for the next page. To determine which 
has been written with a "1", then they may reset the flag, lines in the cache to flush, the invention Resolver examines 
forcing reads of the frame buffer to resume for viable the dirty bit for each line. All lines that are dirty are flushed, 
determination. This flag may also be monitored by the 5 After flushing the caches, the Resolver marks all lines as 
ScreenRefresh module itself so that it may avoid reading the clean (it does not destroy the contents of the cache). 
SBI and SBO bits during the next pass of the screen update. Normally the Resolver accesses the FC bits in their caches 
Another feature, referenced herein as All-Normal instead of frame buffer memory. Preferred embodiments of 
Detection, is similar to Displayed-Buffer Detection dis- the invention allow for the Resolver, if necessary, to manipu- 
cussed above. When the FastQearEnable register has at least 10 late these bits in the frame buffer directly instead of within 
one plane set enabled, there is a possibility that a FC bit for the confines of the caches. An example of when this would 
one of the pixels on the screen is set. Each Dual-Resolver be necessary is when the bits are read to fill the ScreenRe- 
chip has two registers that hold FastClearEnable (FCEn) fresh module's pixel FIFO. 

bits. Each FCEn bit corresponds to a plane set or an In a preferred embodiment, an additional bit has been 
individual plane. If a plane's FCEn bit is disabled, then the 15 added to the IZ data format for the Fill Header: bit 
polarity of the FC bit does not affect the interpretation of that 25-BLOCK_FILL. When this bit is set, the fill request 
plane. For all enabled planes, the FC bit determines whether applies to an eight-scanline block. The scanlines affected by 
the frame buffer's contents or the clear value represent a the request are the current line indicated by the address in the 
pixel When the FC bit is set, the Resolver holds the pixel's first header word and the next seven scanlines. The spans on 
clear value for all of the ENABLED (FCEn-1) plane sets. 20 all eight scanlines begin at the same x coordinate on the 
Therefore, if one plane set is enabled in the FCEn register, screen. This allows the Resolver 134, 136 (FIG. 1) to 
the Resolver must first read the FC bit for a pixel on read, accelerate the manipulation of the FC, SBI, and SBO bits via 
or read/modify/write operations to determine the pixel's Visual 7, since these bits are all packed many pixels per 
effective contents. Extra reads during a rendering cycle slow word in the frame buffer. Since most requests will not be 
down performance. Also, the ScreenRefresh module in the 25 aligned to the invention's pixel block boundaries, the 1ZP 
Dual Resolver must read the FC bits to determine the must handle the top and bottom conditions of the rectangular 
effective contents of a pixel as it updates the screen. region. At the top, the IZP sends single -scanline requests 

In preferred embodiments, performance is enhanced until it reaches a horizontal pixel block boundary. It then 
through All-Normal Detection. As with Displayed-Buffer sends multiple-scan line requests via BLOCK__FILL mode 
Detection discussed above, by linking this function to the 30 until the number of scanlines remaining in the request is less 
ScreenRefresh module's functions, preferred embodiments than eight (the height of a pixel block). The IZP then 
of the invention may detect the presence of any set FC bits resumes sending single-scan line requests until it completes 
on the displayed pixels of the screen at least 76 times per the fill. 

second, and preferably at speeds of at least 85 Hz. In a preferred embodiment each Resolver 134, 136 is not 

A preferred embodiment also implements a FastClear 35 assigned adjacent pairs of pixels. In a 4-Resolver configu- 
Cache. Once any bit is set in the FCEn register, the FC bits ration (2 Dual Resolver chips), each Resolver covers every 
for a pixel must be evaluated before the pixel may be fourth pixel. In the 8-Resolver configuration (4 Dual 
accurately manipulated. When a pixel is written, the FC bit Resolver chips), each Resolver covers every eighth pixel, 
for the pixel must be reset Performing these reads and Since Resorvers are packaged in pairs in the invention, a 
writes takes memory cycles that could otherwise be dedi- 40 package covers every other pixel or every fourth pixel for 
cated to rendering. Furthermore, these " read/ nwxhfy /writes" the 4- and 8-Resolver configurations, respectively, 
has a tendency to break pixels that could otherwise be As described hereinabove, each Resolver module within 
bursted together into many smaller bursts. To mmimm this a Dual Resolver device controls a pair of Synchronous 
impact, each Resolver module in the invention holds a DRAMs (SDRAMs). The preferred memory device for the 
FastClear cache. 45 frame buffer 116 (FIG. 1) are 1 Megxl6 SDRAMs, but the 

There are enough locations in each Resolver' s FastClear Resorvers support 2 Megx8 SDRAMs in case they are more 
cache to hold all of one buffer's FC bits for two open pages, available for prototype checkout than the xl6s, and could be 
or 10*8*2=160 bits. The cache is actually held in an designed to support other memory configurations if neces- 
8-wordx32-bit RAM. Four words are unused in the 128 PPP sary. Regardless of the memory used, the feature subset of 
configuration, where the cache only needs to hold 8*8*2- 50 such memory most important to the frame buffer architec- 
128 bits, and two words are unused in the 102 PPP con- tare of the invention includes pipeline mode — the ability to 
figurations. The Resolver may fill and flush this cache much issue a new column command on every clock cycle: dual- 
more quickly (for most operations) than updating one jpixel* • ^^bank-the memory-array is divided" iuto* two equal 1 halves, 
at a time in memory. each of which may have a page open with an independent 

The Resolver normally fills and flushes the FastClear 55 page address; pulsed RAS; high-speed — at least a 100 MHZ 
cache during page crossings. On opening a page, the clock rate; low-voltage; LVl'l L Signaling interface; support 
Resolver fills the cache for that bank. During accesses to the for 4096 Pages of Memory; support for a page size of 256 
open page, the Resolver updates the cache locations instead locations; full-page burst length; CAS latency of 3; DQM 
of the FC locations in the frame buffer. When a page is to be Write latency of zero, DQM Read latency of 2; and prefer- 
closed, the Resolver flushes the appropriate cache lines, 60 ably packaged in a 400 mil, 50 pin. TSOP II package, 
updating the frame buffer. A cache line holds the data that is When accessing the frame buffer 116 (FIG. 1) when there 
stored in one word (32 bits) of memory. In preferred is a draw request, the Resotver's memory controller tries to 
embodiments, the fill algorithm for a full cache simply to satisfy the request via onboard logic referred herein as the 
completely fill the cache regardless of the request. All lines Burst Builder. The Resolve r's Burst Builder groups 
in the cache are set to clean. Any lines touched by requests 65 sequences of reads and writes to the frame buffer into bursts 
while the page remains open are marked as dirty. When a to use the SDRAM interface more efficiently, 
page is scheduled to be closed, the invention Resolver must Fundamentally, a burst is a sequence of transactions that 
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occur without intervening page crossings. Hie general struc- Key: 
hire of a burst is as follows: [Page CommandslRead 
Requests][Read-*Write TransitionlWrite Requests]. 

Implied by this format is that all page requests (Close. 
Open) are performed before the burst is started. Also implied 5 
is that all read requests for all the pixels in the burst will be 
completed before the SDRAM bus is reversed. After the bus 
is reversed, all the write requests for all the pixels in the 
burst will be completed. By minimizing the number of dead 
clocks on the SDRAM bus incurred from switching the bus 10 
frequently from a read to a write, performance is optimized. 

The Burst Builder generates one list of required plane sets 
to be accessed for all pixels in the burst. Two pixels may be 
placed in the same burst only if certain conditions are true: 
(1) Only one page in a SDRAM bank may be opened at one 15 
time. A JEDEC-standard SDRAM is dual-banked. 
Therefore, only pixels destined for the same pair of pages 
(one from each SDRAM bank) may be bursted together; (2) 
If a read/modify/write is required for a pixel, then only one 
access to that pixel is allowed within the same burst; and (3) 20 
If a plane set to be written for two pixels lies within the same 
byte in memory (for example, Mask in 2 Mpixel), then those 
two pixels must be split into separate bursts. 

Except for interruptions from screen refresh, page cross- 
ings will onfy he performed at the beginning of a burst If the 25 
Burst Builder indicates that Fast Clears are necessary for a 
burst, then the FastClear cache will be filled when the page 
is opened. Also, if a page is scheduled to be closed before the 
next burst begins, the Resolver will flush any dirty pages in 
the cache before closing the current page. Therefore, a more 30 
general format for bursts is as follows: [Flush Cache JPage 
CommandslFill Cache I Read Requests] . . . [Read— Write 
TransitionlWrite Requests]. 

When storing OpenGL texture borders, the border data is 
stored in texture memory along with the texture data, and it 35 
may be thought of as a collection of single maps. For this 
discussion, it is assumed that: 

1. The currently active border is defined by the GE_TEX 

BDR^_ORG register. 

2. The V coordinate of GE_TEX_BDR_ORG is an inte- 40 
gral multiple of 64. This is regardless of map size or if 
mqnnapping is enabled. 

3. The U coordinate of GE_TEX_BDR^ORG is an inte- 
gral multiple of the base map U size. 

4. The border for a map must be stored in the same texture 45 
memory bank (Texture Memory Odd or Texture Memory 
Even) as the associated texture data. For mipmaps, tins 
means the borders swap banks along with the normal 
texture image data. 

5. A group of 8 lines is required to store the borders for a 50 
map . Within a bank of texture memory, 8 such groups are 
possible since the V coordinate of GE_TEX_JBDR__ 

- ORG must 'be an integral multiple of 64/^' ~ ^ 

6. For each border group, the 8 tines are defined as follows: 
Line 0: Bottom border, Line 1: Top border line 2: Left 55 
border, copy 0; line 3: Left border, copy 1; Line 4: Right 
border, copy 0; line 5: Right border, copy 1; Line 6: 
Corner borders, copy 0; line 7: Corner borders, copy 1. 
Multiple copies of some borders are used due to the layout 

of texture memory. A unified set of rules is given below for 60 
border storage that does not depend on the type of synchro- 
nous DRAMs that are used. Not all the border texels will 
always be accessed according to the rules, but all 8 lines of 
border storage are required with the current texture memory 
layout. 65 

For simplicity, BDR.U and BDR.v are the U and V values 
respectively from GE_TEX_J*DR_ORG register. 



stored in 
top border 
left border 
top-left corner 
bottom-left 



order, indicates a span >= 1 
B bottom border 

R right border 

TR top-right corner 
BR bottom-right 



The following diagram shows border storage for a single 
map or the base map from a mipmap set If the V map size 
is =1, then follow the rules in the next section for interme- 
diate mipmaps. 
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-> iLEFTO 
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rCORNERO 
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The bottom and top rows are stored at V addresses of 
BDR.v+0 and BDR.v+1. The U addresses start with BDR.u 
and increment once every texel until the entire width of the 
map is loaded. The left and right borders are each duplicated 
on two rows. The two rows for the left border are loaded at 
BDR.v+2 and BDR.v+3, and the two rows for the right 
border are loaded at BDR.v+4 and BDR.v+5. The U 
addresses in BDR.u corresponds to the top of the map and 
increments as the border is traversed from top to bottom. 
Finally, the corner texels are duplicated on rows BDR.v+6 
and BDR.v+7. Beginning at U address BDR.u. the corners 
are stored in order top-left, top-right, bottom-left, bottom- 
right. 

Regarding storage for Intermediate Mipmaps, if the cur- 
rent V map size is >1, the border storage is very similar to 
what is described above. Note that the texture memory bank 
is swapped for every increasing integer LOD value. The 
border storage is in the same bank as the associated texture 
data. The base V address for the border group is found 
according to the equation base_v«BDR.v+(lod/2)*8. The 
bank for base_v is the same as for the corresponding 
mipmap . The order of the rows and the U addresses are as 
follows: 



BDR.O 
! basel_v + 0: * — - 
base_v + 1: — 
basc_v + 2: — 
base_v + 3: — 
basc_v + 4: 
base_v + 5: 
base_v + 6: 
base__v + 7: 



:BOTTOM - 

TOP 

:LEFT 0 

:LEFT1 

iRIGHTO 

iRIGHTl 



TLTRBLBR 
TLTRBLBR 
+0 +1 +2 +3 



rOORNER 0 
rCORNERl 



Regarding storage for Final Mipmap, a slightly different 
situation exists when the current V map size is =1 due to the 
way texture memory is interleaved in V between Set 0 and 
Set 1. Simply said, the border data must now come from the 
opposite set in which the texture data are stored. The map is 
now 1 texel high, but this texel could be stored in either Set 
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0 or Set 1. To compensate, the top and bottom texels must 
also be duplicated in Set 0 and Set 1. This condition did not 
exist for the previous cases. 

In addition, the left and right borders must be duplicated 
as well if V map size is 1. Although not imposed by the 
physical arrangement of texture memory, this simplifies the 
hardware address translation. Starting at BDR.u, the same 
border value is stored into two adjacent locations for each of 
these rows. 

The above rules for intermediate mipmaps are still used 
for borders when the V map size «=L The locations of the 
duplicate top and bottom borders varies depending on the 
current level of detail. 

Although various exemplary embodiments of the inven- 
tion have been disclosed, it should be apparent to those 
skilled in the art that various changes and modifications can 
be made which will achieve some of the advantages of the 
invention without departing from the true scope of the 
invention! These and other obvious modifications are 
intended to be covered by the appended claims. 

We claim: 

1. An apparatus for displaying a graphical image on a 
display device having a plurality of pixels, the apparatus 
comprising: 

a frame buffer that stores image data associated with the 
graphical image, the frame buffer including a plurality 
of consecutive address locations; and 

a first processor that processes image data for a first set of 
stripes, each stripe in the first set of stripes being a 
plurality of contiguous pixels on the display device, 
each stripe in the first set of stripes being noncontigu- 
ous with the other stripes in the first set of stripes, 

the first processor placing the image data for the first set 
of stripes in a first set of consecutive address locations 
in the frame buffer. 

2. The apparatus as defined by claim 1 further comprising: 
a second processor that processes image data for a second 

set of stripes, each stripe in the second set of stripes 
being a plurality of contiguous pixels on the display 
device, each stripe in the second set of stripes being 
noncontiguous with the other stripes in the second set 
of stripes, 

the second processor storing the image data for the second 
set of stripes in a second set of consecutive address 
locations in the frame buffer, 

the first set of stripes having no common stripes with the 
second set of stripes. 



7,744 B2 

28 " 

3. The apparatus as defined by claim 2 wherein the first 
processor and the second processor are resolve rs. 

4. The apparatus as defined by claim 1 wherein the first set 
of consecutive address locations includes-consecutively 

5 stored intensity data. 

5. An apparatus for displaying a graphical image on a 
display device having a plurality of pixels, the display 
defining a plurality of contiguous pixel blocks that each 
include a plurality of contiguous pixels, the apparatus com- 

10 prising: 

a first processor that processes graphical image data for a 
first set of stripes, each stripe in the first set of stripes 
being a plurality of contiguous pixels within a single 
one of the pixel blocks; and 
a second processor that processes graphical image data for 
a second set of stripes, each stripe in the second set of 
stripes being a plurality of contiguous pixels within a 
single one of the pixel blocks; 
20 the first processor and second processor processing dif- 
ferent stripes in a given pixel block. 

6. The apparatus as defined by claim 5 further comprising 
a third processor that processes graphical image data for a 
third set of stripes, each stripe in the third set of stripes being 

25 a plurality of contiguous pixels within a single one of the 
pixel blocks, 

the first, second and third processors processing different 
stripes in a given pixel block. 

7. A method of processing graphical image data for 
30 display on a display device having a plurality of pixels, the 

method comprising: 
dividing the display device into a plurality of blocks, each 

block including a plurality of contiguous pixels; 
defining a plurality of stripes within each of the plurality 
35 ofblocks; 

assigning a first set of stripes to a first processor; 
assigning a second set of stripes to a second processor, the 
first set of stripes having no common stripes with the 
40 second set of stripes, 

controlling the first processor to process the first set of 
stripes; and 

controlling the second processor to process the second set 
of stripes. 

45 8. The method as defined by claim 7 wherein the stripes 
are 8-by-l pixel wide. 

***** 
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